Modified-prior i-vector estimation for language identification of short duration utterances
نویسندگان
چکیده
In this paper, we address the problem of Language Identification (LID) on short duration segments. Current state-of-the-art LID systems typically employ total variability i-Vector modeling for obtaining fixed length representation of utterances. However, when the utterances are short, only a small amount of data is available, and the estimated i-Vector representation will consequently exhibit significant variability, making the identification problem challenging. In this paper, we propose novel techniques to modify the standard normal prior distribution of the i-Vectors, to obtain a more discriminative i-Vector extraction given the small amount of available utterance data. Improved performance was observed by using the proposed i-Vector estimation techniques on short segments of the DARPA RATS corpora, with lengths as small as 3 seconds.
منابع مشابه
Bidirectional Modelling for Short Duration Language Identification
Language identification (LID) systems typically employ ivectors as fixed length representations of utterances. However, it may not be possible to reliably estimate i-vectors from short utterances, which in turn could lead to reduced language identification accuracy. Recently, Long Short Term Memory networks (LSTMs) have been shown to better model short utterances in the context of language iden...
متن کاملModified-prior PLDA and score calibration for duration mismatch compensation in speaker recognition system
To deal with the performance degradation of speaker recognition due to duration mismatch between enrollment and test utterances, a novel strategy to modify the standard normal prior distribution of the i-vector during probabilistic linear discriminant analysis (PLDA) modeling is employed. This new modified-prior PLDA model incorporates the covariance matrix scaled with duration of each utteranc...
متن کاملAccounting for uncertainty of i-vectors in speaker recognition using uncertainty propagation and modified imputation
One of the biggest challenges in speaker recognition is incomplete observations in test phase caused by availability of only short duration utterances. The problem with short utterances is that speaker recognition needs to be handled by having information from only limited amount of acoustic classes. By considering limited observations from a test speaker, the resulting i-vector as a representa...
متن کاملمقایسه روش های طیفی برای شناسایی زبان گفتاری
Identifying spoken language automatically is to identify a language from the speech signal. Language identification systems can be divided into two categories, spectral-based methods and phonetic-based methods. In the former, short-time characteristics of speech spectrum are extracted as a multi-dimensional vector. The statistical model of these features is then obtained for each language. The ...
متن کاملUBM fused total variability modeling for language identification
This paper proposes Universal Background Model (UBM) fusion in the framework of total variability or i-vector modeling with the application to language identification (LID). The total variability subspace which is typically exploited to discriminate between the language classes of different speech recordings, is trained by combining the normalized Baum-Welch statistics of multiple UBMs. When th...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2014